Presenter: Tony Liang
October 31, 2024
Circulating free DNA (cfDNA) are DNA fragments released into bloodstream
Fraction of cfDNA could be released from cancer or tumor cells are circulating-tumor DNA (ctDNA)
Contains genetic and epigenetic changes, and could reveal the cells from which is originated
Current cfDNA screening test can detect presence of abnormal signals but cannot tell tumor’s origin or cancer type or tissue of origin (TOO)
Existing methods limiations
In some sense, “combined” existing methodology like nonnegative least squares, matrix factorization etc.
\[ f(A) \quad = \quad \sum\limits_{i=1}^n \sum\limits_{k=1}^p \quad W_{ik} \quad \Big| \underbrace{R_{ik}^{\text{(cfdna)}}}_{(1)} - \underbrace{\sum\limits_{j=1}^m A_{ij} B_{jk}}_{(2)}\Big| \]
Some math behind how MetDecode address unknown cell type contributor
To account unknown contributors in cfDNA mixture by adding \(h\) extra rows to \(R^{\text{(atlas)}}\)
\[ R_{hk}^{\text{(atlas)}} = \begin{cases} R_k^{lb}, \quad e_k > 0 \\ R_k^{ub}, \quad otherwise \end{cases} \quad \text{where} \quad e_k = \text{median}_i \quad \Big( -R_{ik}^{(cfdna)} + \sum\limits_{j} \alpha_{ij} R_{jk}^{(\text{atlas})} \Big) \]
Pearson Correlation Coefficient and Mean Squared Error to evaluate MetDecode estimations
Accuracy to evaluate multiclass cancer TOO prediction, and Cohen’s kappa to adjust for multiclass nature
Also, looked into limit of detection using in-silico mixtures of tumor gDNA and healthy cfDNA
Ran on 50 simulation runs, each containing \(5000\) simulated cfDNA samples.
Then computed Pearson Correlation Coefficient of different deconvolution algorithms
Upon averaging all correlation coefficients, MetDecode was significantly higher than all other approaches
abc
MetDecode with 1 unknown contributor performs best based on Cohen’kappa
All methods do equally poor for \(< 50\%\) accuracy when predicting all samples
Closer performance when looking at those \(19\) samples with tumor fraction \(> 3\%\)1
How could one utilize cfDNA?
cfDNA epigenetic signatures can be used to deduce TOO or cancer type
MetDecode is an algorithm that estimates contributions and type of cancer in cfDNA sample
It models unknown contributors not present in the reference atlas
And accounts for coverage of each marker region to alleviate potential sources of noise
Limited size of cfDNA samples for different cancer types
Another limit
Deconvoluting and defining the TOO will aid the oncologists in identifying the tumor and direct treatment
Why weighting approach only improves deconvolution accuracy on cancer components only and not in blood cell types?
Cell type deconvolution still seems hard (low accuracy in terms of predicting cancer type), what is the next step?
Aside, can you always just combined existing approach to get a “new” method out?